AITopics | hardware resource

Collaborating Authors

hardware resource

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

964b1c8dd5667fd647c09c8772829fd1-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 21:47:04 GMT

algorithm, neuroschedule, opération, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Monterey County > Monterey (0.14)
North America > Canada > British Columbia > Vancouver (0.04)
(7 more...)

Genre: Research Report (0.68)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Khiops: An End-to-End, Frugal AutoML and XAI Machine Learning Solution for Large, Multi-Table Databases

Boullé, Marc, Voisine, Nicolas, Guerraz, Bruno, Hue, Carine, Olmos, Felipe, Popescu, Vladimir, Gouache, Stéphane, Bouget, Stéphane, Bondu, Alexis, Gauthier, Luc Aurelien, Benrekia, Yassine Nair, Clérot, Fabrice, Lemaire, Vincent

arXiv.org Artificial IntelligenceNov-4-2025

Khiops is an open source machine learning tool designed for mining large multi-table databases. Khiops is based on a unique Bayesian approach that has attracted academic interest with more than 20 publications on topics such as variable selection, classification, decision trees and co-clustering. It provides a predictive measure of variable importance using discretisation models for numerical data and value clustering for categorical data. The proposed classification/regression model is a naive Bayesian classifier incorporating variable selection and weight learning. In the case of multi-table databases, it provides propositionalisation by automatically constructing aggregates. Khiops is adapted to the analysis of large databases with millions of individuals, tens of thousands of variables and hundreds of millions of records in secondary tables. It is available on many environments, both from a Python library and via a user interface.

artificial intelligence, khiop, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2508.20519

Genre: Research Report (0.64)

Industry:

Education (0.86)
Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Add feedback

RAG-Stack: Co-Optimizing RAG Quality and Performance From the Vector Database Perspective

Jiang, Wenqi

arXiv.org Artificial IntelligenceOct-24-2025

Retrieval-augmented generation (RAG) has emerged as one of the most prominent applications of vector databases. By integrating documents retrieved from a database into the prompt of a large language model (LLM), RAG enables more reliable and informative content generation. While there has been extensive research on vector databases, many open research problems remain once they are considered in the wider context of end-to-end RAG pipelines. One practical yet challenging problem is how to jointly optimize both system performance and generation quality in RAG, which is significantly more complex than it appears due to the numerous knobs on both the algorithmic side (spanning models and databases) and the systems side (from software to hardware). In this paper, we present RAG-Stack, a three-pillar blueprint for quality-performance co-optimization in RAG systems. RAG-Stack comprises: (1) RAG-IR, an intermediate representation that serves as an abstraction layer to decouple quality and performance aspects; (2) RAG-CM, a cost model for estimating system performance given an RAG-IR; and (3) RAG-PE, a plan exploration algorithm that searches for high-quality, high-performance RAG configurations. We believe this three-pillar blueprint will become the de facto paradigm for RAG quality-performance co-optimization in the years to come.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.20296

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

NeuroSchedule: A Novel Effective GNN-based Scheduling Method for High-level Synthesis

Neural Information Processing SystemsAug-17-2025, 03:16:00 GMT

High-level synthesis (HLS) is widely used for transferring behavior-level specifications into circuit-level implementations. As a critical step in HLS, scheduling arranges the execution order of operations for enhanced performance.

artificial intelligence, machine learning, opération, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Monterey County > Monterey (0.14)
North America > Canada > British Columbia > Vancouver (0.04)
(7 more...)

Genre: Research Report (0.68)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

PLoRA: Efficient LoRA Hyperparameter Tuning for Large Models

Yan, Minghao, Wang, Zhuang, Jia, Zhen, Venkataraman, Shivaram, Wang, Yida

arXiv.org Artificial IntelligenceAug-6-2025

Low-rank Adaptation (LoRA) has gained popularity as a fine-tuning approach for Large Language Models (LLMs) due to its low resource requirements and good performance. While a plethora of work has investigated improving LoRA serving efficiency by serving multiple LoRAs concurrently, existing methods assume that a wide range of LoRA adapters are available for serving. In our work, we conduct extensive empirical studies to identify that current training paradigms do not utilize hardware resources efficiently and require high overhead to obtain a performant LoRA. Leveraging these insights, we propose PLoRA, which automatically orchestrates concurrent LoRA fine-tuning jobs under given hardware and model constraints and develops performant kernels to improve training efficiency. Our experimental studies show that PLoRA reduces the makespan of LoRA fine-tuning over a given hyperparameter search space by up to 7.52x and improves training throughput by up to 12.8x across a range of state-of-the-art LLMs.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2508.02932

Genre: Research Report > New Finding (0.34)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AI-Driven Optimization of Hardware Overlay Configurations

Karakchi, Rasha

arXiv.org Artificial IntelligenceMar-8-2025

Designing and optimizing FPGA overlays is a complex and time-consuming process, often requiring multiple trial-and-error iterations to determine a suitable configuration. This paper presents an AI-driven approach to optimizing FPGA overlay configurations, specifically focusing on the NAPOLY+ automata processor implemented on the ZCU104 FPGA. By leveraging machine learning techniques, particularly Random Forest regression, we predict the feasibility and efficiency of different configurations before hardware compilation. Our method significantly reduces the number of required iterations by estimating resource utilization, including logical elements, distributed memory, and fanout, based on historical design data. Experimental results demonstrate that our model achieves high prediction accuracy, closely matching actual resource usage while accelerating the design process.

configuration, overlay, prediction, (14 more...)

arXiv.org Artificial Intelligence

2503.06351

Country: North America > United States > South Carolina (0.05)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.37)

Add feedback

Exploring the Limitations of Kolmogorov-Arnold Networks in Classification: Insights to Software Training and Hardware Implementation

Tran, Van Duy, Le, Tran Xuan Hieu, Tran, Thi Diem, Pham, Hoai Luan, Le, Vu Trung Duong, Vu, Tuan Hai, Nguyen, Van Tinh, Nakashima, Yasuhiko

arXiv.org Artificial IntelligenceJul-25-2024

Kolmogorov-Arnold Networks (KANs), a novel type of neural network, have recently gained popularity and attention due to the ability to substitute multi-layer perceptions (MLPs) in artificial intelligence (AI) with higher accuracy and interoperability. However, KAN assessment is still limited and cannot provide an in-depth analysis of a specific domain. Furthermore, no study has been conducted on the implementation of KANs in hardware design, which would directly demonstrate whether KANs are truly superior to MLPs in practical applications. As a result, in this paper, we focus on verifying KANs for classification issues, which are a common but significant topic in AI using four different types of datasets. Furthermore, the corresponding hardware implementation is considered using the Vitis high-level synthesis (HLS) tool. To the best of our knowledge, this is the first article to implement hardware for KAN. The results indicate that KANs cannot achieve more accuracy than MLPs in high complex datasets while utilizing substantially higher hardware resources. Therefore, MLP remains an effective approach for achieving accuracy and efficiency in software and hardware implementation.

dataset, kan, mlp, (13 more...)

arXiv.org Artificial Intelligence

2407.1779

Country:

Asia > Japan (0.05)
Asia > Vietnam (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

COSTREAM: Learned Cost Models for Operator Placement in Edge-Cloud Environments

Heinrich, Roman, Binnig, Carsten, Kornmayer, Harald, Luthra, Manisha

arXiv.org Artificial IntelligenceMar-13-2024

In this work, we present COSTREAM, a novel learned cost model for Distributed Stream Processing Systems that provides accurate predictions of the execution costs of a streaming query in an edge-cloud environment. The cost model can be used to find an initial placement of operators across heterogeneous hardware, which is particularly important in these environments. In our evaluation, we demonstrate that COSTREAM can produce highly accurate cost estimates for the initial operator placement and even generalize to unseen placements, queries, and hardware. When using COSTREAM to optimize the placements of streaming operators, a median speed-up of around 21x can be achieved compared to baselines.

ostream, placement, query, (17 more...)

arXiv.org Artificial Intelligence

2403.08444

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Spain > Galicia > Madrid (0.04)
(22 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Efficient and Mathematically Robust Operations for Certified Neural Networks Inference

Geyer, Fabien, Freitag, Johannes, Schulz, Tobias, Uhrig, Sascha

arXiv.org Artificial IntelligenceJan-16-2024

In recent years, machine learning (ML) and neural networks (NNs) have gained widespread use and attention across various domains, particularly in transportation for achieving autonomy, including the emergence of flying taxis for urban air mobility (UAM). However, concerns about certification have come up, compelling the development of standardized processes encompassing the entire ML and NN pipeline. This paper delves into the inference stage and the requisite hardware, highlighting the challenges associated with IEEE 754 floating-point arithmetic and proposing alternative number representations. By evaluating diverse summation and dot product algorithms, we aim to mitigate issues related to non-associativity. Additionally, our exploration of fixed-point arithmetic reveals its advantages over floating-point methods, demonstrating significant hardware efficiencies. Employing an empirical approach, we ascertain the optimal bit-width necessary to attain an acceptable level of accuracy, considering the inherent complexity of bit-width optimization.

algorithm, arithmetic, opération, (17 more...)

arXiv.org Artificial Intelligence

2401.08225

Country: Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)

Genre: Research Report (1.00)

Industry: Transportation > Air (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy Physics

Que, Zhiqiang, Fan, Hongxiang, Loo, Marcus, Li, He, Blott, Michaela, Pierini, Maurizio, Tapper, Alexander, Luk, Wayne

arXiv.org Artificial IntelligenceJan-9-2024

This work presents a novel reconfigurable architecture for Low Latency Graph Neural Network (LL-GNN) designs for particle detectors, delivering unprecedented low latency performance. Incorporating FPGA-based GNNs into particle detectors presents a unique challenge since it requires sub-microsecond latency to deploy the networks for online event selection with a data rate of hundreds of terabytes per second in the Level-1 triggers at the CERN Large Hadron Collider experiments. This paper proposes a novel outer-product based matrix multiplication approach, which is enhanced by exploiting the structured adjacency matrix and a column-major data layout. Moreover, a fusion step is introduced to further reduce the end-to-end design latency by eliminating unnecessary boundaries. Furthermore, a GNN-specific algorithm-hardware co-design approach is presented which not only finds a design with a much better latency but also finds a high accuracy design under given latency constraints. To facilitate this, a customizable template for this low latency GNN hardware architecture has been designed and open-sourced, which enables the generation of low-latency FPGA designs with efficient resource utilization using a high-level synthesis tool. Evaluation results show that our FPGA implementation is up to 9.0 times faster and achieves up to 13.1 times higher power efficiency than a GPU implementation. Compared to the previous FPGA implementations, this work achieves 6.51 to 16.7 times lower latency. Moreover, the latency of our FPGA design is sufficiently low to enable deployment of GNNs in a sub-microsecond, real-time collider trigger system, enabling it to benefit from improved accuracy. The proposed LL-GNN design advances the next generation of trigger systems by enabling sophisticated algorithms to process experimental data efficiently.

initiation interval, latency, matrix, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3640464

2209.14065

Country:

Europe > United Kingdom > England > Greater London > London (0.05)
Europe > Switzerland (0.04)
Europe > Ireland (0.04)
Asia > China (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Architecture (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback